perf: reduce ConcurrentDictionary closure allocations in hot paths#5210
perf: reduce ConcurrentDictionary closure allocations in hot paths#5210
Conversation
Add TryGetValue fast paths before GetOrAdd calls to avoid closure allocations and unnecessary ConcurrentDictionary contention on the hot path. Key changes: - TestExecutionGuard: avoid allocating TaskCompletionSource when a test is already executing by checking TryGetValue first - EventReceiverOrchestrator: add TryGetValue fast paths before GetOrAdd for first-test-in-session/assembly/class event tasks and for assembly/class test count counters, avoiding closure allocations from captured state - HookDelegateBuilder: mark GetCachedGenericTypeDefinition lambda as static to prevent implicit closure allocation
There was a problem hiding this comment.
Code Review
Overall, this is a well-reasoned performance optimization PR. The changes are generally correct and the approach is sound. Here are my findings:
✅ — lambda
Clean, correct change. Adding is self-documenting intent and prevents any future accidental closure introduction if the lambda is modified. No concerns.
✅ — TryGetValue fast path
The fast path is correct. Worth noting the subtle thread-safety contract here: calls / before (in ), so any TCS retrieved via is guaranteed to be either:
- Still in-progress (await will block until completion), or
- Already completed (await returns immediately)
Either way, behaves correctly. The optimization is valid — it avoids allocating a on the common path when the test is already tracked.
✅ — First-test event task fast paths
This is the highest-impact change. The closures in do capture , //, and — so every call without the fast path allocates a new closure object on the heap. The bypass avoids this on the hot path (i.e., every test after the first in a session/assembly/class). Correct and well-targeted.
⚠️ Minor: PR description slightly overstates savings for counter dictionaries
In and , the pre-existing lambdas are . In C#, lambdas with no captures compile to cached static delegate fields — the delegate itself is never heap-allocated on repeated calls. So the bypass here does not save a closure/delegate allocation; it only saves the internal hash-table lookup and potential lock acquisition.
This is still a valid win under contention, but the PR description frames these as "closure allocation" savings when they're really "lock contention" savings. Not a correctness issue — just worth being precise for future readers.
🔍 — cold path concern (pre-existing, not introduced here)
The counter dicts (, ) are not cleared during , unlike the first-test task dicts. If is ever called more than once (e.g., for re-runs), the accumulates on top of existing values. This pre-dates this PR, but the refactored code now makes this more visible. Worth confirming the lifecycle contract — is always called exactly once, or should these counters also be reset?
Summary
The core optimization is sound and the implementation is correct. The -before- pattern is a well-established idiom for hot paths. The most impactful part is the first-test event closure avoidance, and the TCS avoidance is clean. The only non-trivial note is the potential double-count concern, which pre-exists this PR but is worth confirming.
There was a problem hiding this comment.
Code Review
Overall, this is a well-reasoned performance optimization PR. The changes are generally correct and the approach is sound. Here are my findings:
HookDelegateBuilder.cs — static lambda
Clean, correct change. Adding static is self-documenting intent and prevents any future accidental closure introduction if the lambda is modified. No concerns.
TestExecutionGuard.cs — TryGetValue fast path
The fast path is correct. Worth noting the subtle thread-safety contract here: ExecuteAndCompleteAsync calls SetResult/SetException before TryRemove (in finally), so any TCS retrieved via TryGetValue is guaranteed to be either in-progress (await blocks) or already completed (await returns immediately). Either way, WaitForExistingExecutionAsync behaves correctly.
The optimization is valid — it avoids allocating a TaskCompletionSource<bool> on the common path when the test is already tracked.
EventReceiverOrchestrator.cs — First-test event task fast paths
This is the highest-impact change. The GetOrAdd closures in InvokeFirstTestIn{Session,Assembly,Class}EventReceiversAsync do capture context, assemblyContext/classContext/sessionContext, and cancellationToken — so every call without the fast path allocates a new closure object on the heap. The TryGetValue bypass avoids this on the hot path (every test after the first in a session/assembly/class). Correct and well-targeted.
Minor: PR description slightly overstates savings for counter dictionaries
In InvokeLastTestIn{Assembly,Class}EventReceiversAsync and InitializeTestCounts, the pre-existing lambdas are static _ => new Counter(). In C#, static lambdas with no captures compile to cached static delegate fields — the delegate itself is never heap-allocated on repeated calls. So the TryGetValue bypass here does not save a closure/delegate allocation; it only saves the GetOrAdd internal hash-table lookup and potential lock acquisition under contention.
This is still a valid win, but the PR description frames these as "closure allocation" savings when they're really "lock contention" savings. Not a correctness issue — just worth being precise for future readers.
InitializeTestCounts — cold path concern (pre-existing, not introduced here)
The counter dicts (_assemblyTestCounts, _classTestCounts) are not cleared during InitializeTestCounts, unlike the first-test task dicts. If InitializeTestCounts is ever called more than once (e.g., re-runs), counter.Add(group.Count()) accumulates on top of existing values. This pre-dates this PR, but the refactored pattern makes it more visible. Worth confirming the lifecycle contract — is InitializeTestCounts always called exactly once, or should these counters also be reset?
Summary
The core optimization is sound and the implementation is correct. The TryGetValue-before-GetOrAdd pattern is a well-established idiom for ConcurrentDictionary hot paths. The most impactful change is the first-test event closure avoidance; the TestExecutionGuard TCS avoidance and the static lambda are clean wins too. The only non-trivial note is a pre-existing potential double-count concern in InitializeTestCounts worth confirming.
| return new ValueTask(existingTask); | ||
| } | ||
|
|
||
| var task = _firstTestInSessionTasks.GetOrAdd("session", |
There was a problem hiding this comment.
@thomhurst FYI there is also an overload for this, which internally does almost exactly the same, see ConcurrentDictionary.cs:1223.
var task = _firstTestInSessionTasks.GetOrAdd("session",
static (_, args) => InvokeFirstTestInSessionEventReceiversCoreAsync(args.Item1, args.Item2, args.Item2),
(context, sessionContext, cancellationToken));
return new ValueTask(task);
Summary
TryGetValuefast path beforeGetOrAddto avoid allocating aTaskCompletionSource<bool>on every call when a test is already executingTryGetValuefast paths beforeGetOrAddcalls for first-test-in-session/assembly/class tasks, avoiding closure allocations from capturedcontext,cancellationToken, and other state on the hot path. Also added fast paths for assembly/class test count counters to reduce lock contention (these usestaticlambdas so the savings are contention-related, not closure-related)GetCachedGenericTypeDefinitionlambda asstaticto prevent implicit closure allocationRationale
Profiling shows ~4.1% exclusive CPU in
ConcurrentDictionaryoperations (TryAddInternal,AcquireAllLocks). These changes reduce allocations and contention by:TryGetValuebeforeGetOrAddso closures capturing state are never created on the fast path (when key already exists)TaskCompletionSourceallocations in the execution guardstaticlambdas (already cached by the compiler, butGetOrAddstill acquires internal locks)staticlambdas where the lambda body doesn't need captured stateNo behavioral changes — only allocation and contention reduction. Thread safety guarantees are preserved since
TryGetValueis lock-free onConcurrentDictionaryand the subsequentGetOrAddhandles the race correctly.Test plan